Provable training set debugging for linear regression

نویسندگان

چکیده

We investigate problems in penalized M-estimation, inspired by applications machine learning debugging. Data are collected from two pools, one containing data with possibly contaminated labels, and the other which is known to contain only cleanly labeled points. first formulate a general statistical algorithm for identifying buggy points provide rigorous theoretical guarantees when follow linear model. then propose an tuning parameter selection of our Lasso-based guarantees. Finally, we consider two-person “game” played between bug generator debugger, where debugger can augment set versions original pool. develop analyze debugging strategy terms Mixed Integer Linear Programming (MILP). empirical results verify utility MILP strategy.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Training Set Debugging Using Trusted Items

Training set bugs are flaws in the data that adversely affect machine learning. The training set is usually too large for manual inspection, but one may have the resources to verify a few trusted items. The set of trusted items may not by itself be adequate for learning, so we propose an algorithm that uses these items to identify bugs in the training set and thus improves learning. Specificall...

متن کامل

Fast Active-set-type Algorithms for L1-regularized Linear Regression

In this paper, we investigate new active-settype methods for l1-regularized linear regression that overcome some difficulties of existing active set methods. By showing a relationship between l1-regularized linear regression and the linear complementarity problem with bounds, we present a fast active-set-type method, called block principal pivoting. This method accelerates computation by allowi...

متن کامل

Interval linear regression

‎In this paper‎, ‎we have studied the analysis an interval linear regression model for fuzzy data‎. ‎In section one‎, ‎we have introduced the concepts required in this thesis and then we illustrated linear regression fuzzy sets and some primary definitions‎. ‎In section two‎, ‎we have introduced various methods of interval linear regression analysis‎. ‎In section three‎, ‎we have implemented nu...

متن کامل

Debugging Inconsistent Answer Set Programs

In this paper we examine how we can find contradictions from Answer Set Programs (ASP). One of the most important phases of programming is debugging, finding errors that have crept in during program implementation. Current ASP systems are still mostly experimental tools and their support for debugging is limited. This paper addresses one part of ASP debugging, finding the reason why a program d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Machine Learning

سال: 2021

ISSN: ['0885-6125', '1573-0565']

DOI: https://doi.org/10.1007/s10994-021-06040-4